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Foreword 



id , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The present document introduces the set of default codecs applying to 3G packet switched conversational multimedia 
applications within the 3GPP system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

x the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the document. 



Introduction 



The present document contains a specification for default multimedia codecs to be used within 3GPP specified IP 
Multimedia Subsystem (IM Subsystem). IM Subsystem as a subsystem includes specifically the conversational IP 
multimedia services, whose service architecture, call control and media capability control procedures have been defined 
in 3GPP specifications TS 24.229 [15], and are based on the 3GPP adopted version of IETF Session Initiated Protocol 
(SIP). 

The term codec is usually associated with a single media type. In case of packet switched transport domain, which IM 
Subsystem will depend on, the individual media types are independently encoded and packetised to appropriate separate 
Real Time Protocol (RTP) packets. These packets are then transported end-to-end inside UDP datagrams over real-time 
IP connections that have been negotiated and opened between the terminals during the SIP call as specified in 
3GPPTS 24.229 [15]. 

From the codec definition viewpoint, the UEs operating within IM Subsystem need to provide encoding/decoding of the 
derived codecs, and perform corresponding packetisation/depacketisation functions. Logical bound between the media 
streams is handled in the SIP session layer, and inter-media synchronisation in the receiver is handled with the use of 
RTP time stamps. 

Finally, since 3GPP networks are inherently error prone, error detection and/or correction must also be provided by the 
individual codecs within IM Subsystem, since they have a comprehensive view of the bit stream they produce and 
therefore can apply the most efficient form of error detection and/or correction. 
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Scope 



The present document introduces the set of default codecs for packet switched conversational multimedia applications 
within 3GPP IP Multimedia Subsystem. Visual and sound communication are specifically addressed. The intended 
applications are assumed to require low-delay, real-time functionality. 

The present document is applicable, but not limited, to services such as PS video telephony and Push to talk over 
Cellular (PoC) as well as Combined CS and IMS services (CSI). 

The applicability of this specification to GERAN is FFS. 
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3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply.: 

3G PS multimedia terminal: terminal based on IETF SIP/SDP internet standards modified by 3GPP for purposes of 
3GPP packet switched network based multimedia telephony 

3.2 Abbreviations 

For the purposes of the present document, the following abbreviations apply: 

AMR Adaptive MultiRate codec 

AVC Advanced Video Codec 

CSI Combination of CS and IMS services 

DSR Distributed Speech Recognition 

IETF Internet Engineering Task Force 

IM Subsystem Internet protocol Multimedia Subsystem 

ITU-T International Telecommunications Union-Telecommunications 

PoC Push to talk over Cellular 

RFC IETF Request For Comments 

RTCP RTP Control Protocol 

RTP Real-time Transport Protocol 

SDP Session Description Protocol 

SES Speech Enabled Services 

SIP Session Initiated Protocol 



General 



3G PS multimedia terminals provide real-time video, audio, or data, in any combination, including none, over 3GPP IM 
Subsystem. Terminals are based on IETF defined multimedia protocols SIP, SDP, RTP and RTCP. Communication 
may be either 1-way or 2-way. Such terminals may be part of a portable device or integrated into an automobile or other 
non-fixed location device. They may also be fixed, stand-alone devices; for example, a video telephone or kiosk. 
Multimedia terminals may also be integrated into PCs and workstations. 

In addition, interoperation with other types of multimedia telephone terminals, such as 3G-324M may be possible, 
however in such case a media gateway functionality supporting 3G-324M - IM Subsystem interworking will be 
required within or outside the IM subsystem. 

For IMS terminals capable of Combined CS and IMS (CSI) operation [46] [47], Annex A of specification [48] provides 
information on how to combine IMS media with CS calls. 
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5 System overview 

The present document describes the required codec related elements for 3G PS multimedia terminal: 

• codecs for 3G PS multimedia terminal; 

• media encapsulation and decapsulation rules for each codec. 



Functional requirements 



SIP protocol itself does not mandate any codecs. Standardisation of codecs does not prevent the use of other codecs that 
can be signalled using the SDP protocol. 3G PS multimedia terminals shall be able to use the same audio and video 
codecs applied in 3G-324M [8]. This will ensure the interoperability with 3G circuit switched multimedia telephony. 

6.1 Audio 

3G PS multimedia terminals offering audio communication (including PoC services) shall support AMR narrowband 
speech codec [9], [10], [11] to [12]. 

The AMR wideband speech codec shall be supported when the 3G PS multimedia terminal supports wideband speech 
working at 16 kHz sampling frequency [16], [17], [39], [40]. 

The usage of telephone-event media format is recommended for DTMF. 
Annex D provides guidelines for using audio in the context of PoC services. 

6.2 Video 

3G PS multimedia terminals offering video communication shall support ITU-T recommendation H.263 [6] [19] 
baseline (Profile 0) Level 45. 

H.263 [6] [19] version 2 Interactive and Streaming Wireless Profile (Profile 3) Level 45 should be supported. 

ISO/IEC 14496-2 [13] (MPEG-4 Visual) Simple Profile at Level 3 should be supported with the following constraints: 

- Number of Visual Objects supported shall be limited to 1. 

The maximum frame rate shall be 30 frames per second. 

The maximum f_code shall be 2. 

The intra_dc_vlc_threshold shall be 0. 

The maximum horizontal luminance pixel resolution shall be 352 pels/line. 

The maximum vertical luminance pixel resolution shall be 288 pels/VOP. 

If AC prediction is used, the following restriction applies: QP value shall not be changed within a VOP (or within a 
video packet if video packets are used in a VOP). If AC prediction is not used, there are no restrictions to changing 
QP value. 

H.264 (AVC) [41] Baseline Profile at Level 1.1 [42] should be supported with constraint_setl_flag=l and without 
requirements on output timing conformance (Annex C of [41]). Each sequence parameter set of H.264 (AVC) shall 
contain the vui_parameters syntax structure including the num_reorder_frames syntax element set equal to 0. 

The H.264 (AVC) decoder in a PSS client shall start decoding immediately when it receives data (even if the stream 
does not start with an IDR access unit) or alternatively no later than it receives the next IDR access unit or the next 
recovery point SEI message, whichever is earlier in decoding order. The decoding process for a stream not starting with 
an IDR access unit shall be the same as for a valid H.264 (AVC) bitstream. However, the client shall be aware that such 
a stream may contain references to pictures not available in the decoded picture buffer. The display behaviour of the 
client is out of scope of this specification. 
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NOTE 1: Terminals may use full-frame freeze and full-frame freeze release SEI messages of H. 264 (AVC) to 
control the display process. 

NOTE 2: An H.264 (AVC) encoder should code redundant slices only if it knows that the far-end decoder makes 
use of this feature (which is signaled with the redundant-pic-cap MIME/SDP parameter as specified in 
[43]). H.264 (AVC) encoders should also pay attention to the potential implications on end-to-end delay. 

NOTE 3: If a codec is supported at a certain level, then all (hierarchically) lower levels shall be supported as well. 
Examples of lower levels include Level 10 for H.263 Profile and 3, Level for MPEG-4 Visual Simple 
Profile and Level 1 for H.264 (AVC) Baseline Profile. However, as for instance Level 20 is not 
hierarchically lower than Level 45 of H.263 Profile and 3, support for Level 45 does not imply support 
for Level 20. 

NOTE 4: All levels are minimum requirements. Higher levels may be supported and used for negotiation. 

NOTE 5: If a codec is supported at a certain level, it implies that on the receiving side, the decoder is required to 
support the decoding of bitstreams up to the maximum capability of this level. On the sending side, the 
support of a particular level does not imply that the encoder may produce a bitstream up to the maximum 
capability of the level. 

6.3 Real time text 

3G PS multimedia terminals offering real time text conversation should support ITU-T Recommendation T.140 [25] 
Text Conversation presentation coding. 

6.4 Interactive and background data 

SIP signalling offers initialisation of packet switched interactive or background class reliable data services as well. 
However specification of such data services are outside the scope of the present document. 



6.5 Speech Enabled Service 



3G PS multimedia terminals offering speech enabled services should support the DSR Extended Advanced Front-end 
codec [37] 

Speech enabled services may also be supported with AMR or AMR-WB audio codecs, however it is noted that there is 
a substantial performance advantage from DSR [45]. 



7 Call control 



Functional requirements for call control are specified in 3GPP TS 23.228 [20]. 

The required signalling functions and call control protocols are specified in 3GPP TS 24.229 [15]. 

8 Bearer control 

The media control is based on declaration of terminal media capability sets in SDP part of appropriate SIP messages. 

Relation of application level SDP signalling and radio access bearer assignment is defined outside the present 
document. The QoS architecture and concept for WCDMA and GERAN is specified in 3GPP TS 23.107 [21]. The end- 
to-end QoS framework involving GPRS and UMTS is specified in 3GPP TS 23.207 [22]. The applicable general QoS 
mechanism and service description for the GPRS in GSM and UMTS is specified in 3GPP TS 23.060 [23]. 
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9 Multimedia stream encapsulation 

9.1 MIME media types 

The terminal shall declare the mandatory and any optional media streams using the codec specific MIME media types 
in the associated SDP syntax. The MIME media types for the mandatory and optional codecs shall be according to the 
corresponding types registered by IANA. 

AMR narrowband speech codec MIME media type as specified in annex B. 

AMR wideband speech codec MIME media type is specified in annex B. 

H.263 [6] video codec MIME media type is specified in annex C. 

MPEG-4 visual simple profile level MIME media type as specified in RFC 3016 [5]. 

H.264 (AVC) video codec MIME media type is specified in [43]. 

ITU-T Recommendation T.140 [25] Text Conversation MIME media type as specified by RFC 4103 [49]. 
Telephone-event MIME media type as specified by RFC 2833 [36]. 
DSR MIME media type as specified in [38]. 



9.2 RTP payload 

RTP payload formats specified by IETF shall be used for real time media streams. 

RTP payload format for the AMR narrowband speech codec is specified in annex B. 

RTP payload format for the AMR wideband speech codec is specified in annex B. 

RTP payload format for the ITU-T Recommendation H.263 [6] video codec is specified in IETF RFC 2429 [3]. 

RTP payload format for the MPEG-4 visual simple profile level is specified in IETF RFC 3016 [5]. 

RTP payload format for the ITU-T Recommendation H.264 (AVC) [41] video codec is specified in [43], where the 
interleaved packetization mode shall not be used. Receivers shall support both the single NAL unit packetization mode 
and the non-interleaved packetization mode of [43], and transmitters may use either one of these packetization modes. 

RTP payload format for the ITU-T Recommendation T.140 [25] text conversation coding is specified in 
IETF RFC 4103 [49]. 

RTP payload format for the telephone-event is specified in IETF RFC 2833 [36]. 

RTP payload format for the DSR Extended Advanced Front-end is specified in [38]. 
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Annex A (informative): 

Information on optional enhancements 

This annex is intended for informational purposes only. This is not an integral part of the present document. 

A.1 Video 

This clause gives recommendations for the video codec implementations within 3G PS multimedia terminals. 

Regardless of which specific video codec standard is used, all video decoder implementations should include basic error 
concealment techniques. These techniques may include replacing erroneous parts of the decoded video frame with 
interpolated picture material from previous decoded frames or from spatially different locations of the erroneous frame. 
The decoder should aim to prevent the display of substantially corrupted parts of the picture. In any case, it is 
recommended that the terminal should tolerate every possible bitstream without catastrophic behaviour (such as the 
need for a user-initiated reset of the terminal). 

3G PS terminal video encoders and decoders are recommended to support the 1:1 pixel format (square format). 

A.1 .1 H. 263 video codec 

H.263 was approved as a standard in 1996. Since then, version 2 and version 3 enhancing version 1 have been approved 
in 1998 and 2000 respectively. As of today, H.263 contains an extensive set of mandatory and optional coding tools. 
H.263 [6] annex X [19] defines codec profiles for various target environments. 

The Baseline Profile (Profile 0) stands for H.263 with no optional modes of operation. It includes the basic coding tool 
set common in modern video coding standards. It provides simple means to insert resynchronisation points within the 
video bitstream, and, therefore, it enables recovery from erroneous or lost data. 

The Version 2 Interactive and Streaming Wireless Profile (Profile 3) provides enhanced compression efficiency when 
compared to the Baseline Profile. Moreover, it provides enhanced error resilience for delivery to wireless devices. 
Specifically, Profile 3 includes the following optional coding modes: 

1) Advanced INTRA Coding (annex I). Use of this mode improves the compression efficiency for INTRA 
macroblocks (whether within INTRA pictures or predictively-coded pictures); 

2) Deblocking Filter (annex J). A deblocking filter improves image quality by reducing blocking artifacts. When 
compared to deblocking filtering performed as a postprocessing operation, the Deblocking Filter Mode reduces 
the amount of required memory, as no additional picture memory is needed for the filtered images. This mode 
also includes the four-motion-vector-per-macroblock feature and picture boundary extrapolation for motion 
compensation, both of which can further improve compression efficiency; 

3) Slice Structured Mode (annex K). This mode provides a flexible mechanism to insert resynchronisation points 
within the video bitstream for recovery from erroneous or lost data. 

4) Modified Quantisation (annex T). This mode enables flexible quantiser control that can be used in sophisticated 
bit-rate control algorithms. In addition, it improves chrominance fidelity. 

[FFS] 

A.2 Audio 

[FFS] 

A.3 Text 

Use of the redundancy coding variant specified in RFC 4103 [49] is recommended for error resilience. 



ETSI 



3GPP TS 26.235 version 7.3.0 Release 7 12 ETSI TS 126 235 V7.3.0 (2008-01) 

Annex B (normative): 

AMR and AMR-WB RTP payload and MIME type 

registration 

The AMR and AMR-WB speech codec RTP payload, storage format and MIME type registration are specified in [35]. 
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Annex C (normative): 

ITU-T H.263 MIME media type registration 

NOTE: The intention is to replace this normative annex with the IETF RFC defining the H.263 [6] video codec 
MIME media type registration when the RFC is available. 

H.263 video codec MIME media type is specified as follows: 

MIME media type name: video; 

MIME subtype name: H263-2000; 

Required parameters: None; 

Optional parameters: 

profile: H.263 profile number, in the range through 8, specifying the supported H.263 annexes/subparts; 

level: Level of bitstream operation, in the range through 99, specifying the level of computational 
complexity of the decoding process. When profile and level parameters are not specified, Baseline Profile 
(Profile 0) Level 10 are the default values. 

The profile and level specifications can be found in [19]. Note that the RTP payload format for H263-2000 is the same 
as for H263-1998 published in RFC 2429 [3], but additional annexes/subparts are specified along with the profiles and 
levels. 
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Annex D (informative): 
Push-to-Talk over cellular (PoC) 



For PoC the audio codecs specified in section 6.1, namely AMR or AMR-WB are applicable. Speech codec bit rates 
and transport formats settings have to be selected considering the available transmission bandwidth and the allowable 
transport delay. In order not to introduce undue delay for RTP packetization, it is recommended to limit the number of 
speech codec frames per packet to 20 and not to use interleaving. 

Under the assumption of RTP packetization according to [35] using octet-aligned mode, no interleaving and using 10 
frames per RTP packet and depending on the IP version in IMS, the following tables show the required bandwidth for 
the available AMR and AMR-WB speech codec modes. Bandwidth restrictions may imply that only the lowest 
AMR/ AMR-WB modes can be used for PoC. In order to maximize speech quality, it is recommended to use the 
respective highest possible bit rate. 

Table 1 : Required bandwidth for PoC using AMR 



AMR 
Mode 


Required 

bandwidth 

when IPv4 is 

used [bits/s] 

[Note] 


Required 

bandwidth 

when IPv6 is 

used [bits/s] 


AMR 4.75 


6840 


7640 


AMR 5.15 


7240 


8040 


AMR 5.9 


8040 


8840 


AMR 6.7 


8840 


9640 


AMR 7.4 


9640 


10440 


AMR 7.95 


10040 


10840 


AMR 10.2 


12440 


13240 


AMR 12.2 


14440 


15240 


Note: For the usage of IP version in IMS 
see TS 23.221 [44], subclause 5.1 . 



Table 2: Required bandwith for PoC using AMR-WB 



AMR-WB 
Mode 


Required 

bandwidth 

when IPv4 is 

used [bits/s] 

[Note] 


Required 

bandwidth 

when IPv6 is 

used [bits/s] 


AMR-WB 
6.60 


8840 


9640 


AMR-WB 
8.85 


11240 


12040 


AMR-WB 
12.65 


14840 


15640 


AMR-WB 
14.25 


16440 


17240 


AMR-WB 
15.85 


18040 


18840 


AMR-WB 
18.25 


20440 


21240 


AMR-WB 
19.85 


22040 


22840 


AMR-WB 
23.05 


25240 


26040 


AMR-WB 
23.85 


26040 


26840 


Note: For the usage of IP version in IMS 
see TS 23.221 [44], subclause 5.1 . 
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